What makes a phase transition? Analysis of the 
random satisfiabihty problem 



Katharina A. Zweig'^, Gergely Palla*^'* Tamas Vicsek'^''^ 

"Department of Biological Physics, Eotvos University, 1117 Pdzmdny P. Stny 1/A, 

Budapest, Hungary 

^Statistical and Biological Physics Research Group of HAS, Eotvos University, 1117 
Pdzmdny P. Stny 1/A, Budapest, Hungary 



Abstract 

In the last 30 years it was found that many combinatorial systems undergo 
phase transitions. One of the most important examples of these can be found 
among the random fc-satisfiability problems (often referred to as fc-SAT), ask- 
ing whether there exists an assignment of Boolean values satisfying a Boolean 
formula composed of clauses with k random variables each. The random 3- 
SAT problem is reported to show various phase transitions at different critical 
values of the ratio of the number of clauses to the number of variables. The 
most famous of these occurs when the probability of finding a satisfiable in- 
stance suddenly drops from 1 to 0. This transition is associated with a rise in 
the hardness of the problem, but until now the correlation between any of the 
proposed phase transitions and the hardness is not totally clear. In this paper 
we will first show numerically that the number of solutions universally follows 
a lognornial distribution, thereby explaining the puzzling question of why the 
number of solutions is still exponential at the critical point. Moreover we pro- 
vide evidence that the hardness of the closely related problem of counting the 
total number of solutions does not show any phase transition-like behavior. This 
raises the question of whether the probability of finding a satisfiable instance 
is really an order parameter of a phase transition or whether it is more likely 
to just show a simple sharp threshold phenomenon. More generally, this paper 
aims at starting a discussion where a simple sharp threshold phenomenon turns 
into a genuine phase transition. 
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1. Introduction 



The analysis of phase transitions and the associated microscopic structures 
is a well-developed scientific approach in physics. In real systems, the obser- 
vation of phases and their different macroscopic behavior comes first, and a 
subsequent analysis reveals how the structure of one phase is transformed into 
the structure of the other phase. This transition is associated with the change 
of a so-called control parameter, such as the temperature. Most interesting are 
abrupt changes in functions measuring the macroscopic behavior, e.g., the den- 
sity or heat capacity, that happen with small changes in the control parameter. 
The function showing the non-analytic behavior or singularities is the order pa- 
rameter of the system and can be seen as a fingerprint of the underlying phase 
transition. Starting with the analysis of random graphs |3| and simple perco- 
lation models 23|, |3|, combinatorial objects came into the focus of statistical 
physicists. A thorough analysis revealed that these simple systems also show 
phase transitions. 

Whereas in percolating systems the phases and their different behaviors are 
visually accessible, this is not the case for other combinatorial systems with 
a proposed phase transition. One of the most important of these systems is 
the so-called satisfiability problem (SAT). Given some Boolean formula, it asks 
whether there exists an assignment of Boolean values to its variables such that 
it is satisfied, i.e., such that it evaluates to true. SAT problems belong to the 
set of NP-hard problems, i.e., so far there is no algorithm to solve them in poly- 
nomial time As with many other NP-hard problems, satisfiability problems 
arise not only in theory but also in industry, e.g., in automotive configuration 
[22,] . in software and hardware design [12], biological sciences .7J: ^-nd artificial 
intelligence [ij. Since satisfiability problems are so abundant, understanding 
when and why they are hard and developing better algorithms is crucial. A 
classic family for analyzing the hardness is the random fc-SAT family in which 
the k variables of each clause are drawn uniformly at random and without rep- 
etition from the set of all variables. Each variable is negated with probability 
0.5. The ratio between the number of clauses m and variables n denoted by 
a = m/n parameterizes the probability P[UNSAT] of finding an unsatisfiable 
instance at a given a. It was observed early 0, [l3 that plotting P[UNSAT] 
against a shows a sharp threshold behavior at some critical ac. Furthermore, 
around this ac it also takes various algorithms the longest time to solve random 
3-SAT problems, i.e., the problems are hard. To quantify the hardness, either 
the number of distinct steps of the solving algorithm is counted, or simply the 
time measured until the problem is solved. The divergence of the hardness to- 
gether with the sudden jurnp of P[UNSAT] at some critical a resembles a phase 
transition-like behavior [2H . The numerical analysis of this sharp threshold 
behavior resulted in Oc — 4.15 ± 0.05 [l^]. Kirkpatrick and Selman could also 
show that there is a non-trivial finite size effect, i.e., that the width of the win- 
dow in which the transition takes place is proportional to n~^/^ for 3-SAT. It is 
thought that this sharp threshold phenomenon is of first order, i.e., in the limit 
of infinite system size and for a < ac P[UNSAT] = 0, and for a > ac P[UNSAT] 
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= 1. For 2-SAT, this could be rigorously shown but for all fc > 3 it is an 
open question. Note that 2-SAT itself is not NP-hard To analyze the nature 
of this sharp threshold behavior, the k — SAT problem can also be represented 
as a spin-glass model, and different theoretical analyzes have arisen from this 
approach 3 15, 16, Since these theoretical analyzes rely on the thermody- 



namical limit whereas numerical approaches can only tackle system sizes of up 
to 100 or even be restricted to system sizes below 40 (depending on the specific 
question), it is not surprising that none of the theoretical approaches matches 
the numerical value of etc — 4.15 ± 0.05. The approach that comes closest is 
based on the analysis of survey propagation that results in ac — 4.267, which is 
believed to be exact 16|. The applied order parameter is very technical and it 
is difficult to analyze how it relates to P [UNS AT] . 

To find out more about the behavior of 3-SAT, we first repeated the exper- 
iment of Kirkpatrick and Selman, and increased the then available system size 
from 100 to 200. A subsequent finite size scaling is much more in accordance 
with the old value of ac = 4.15 ± 0.05 than with ac = 4.267. In the second 
step, we aim at understanding a different parameter, namely the entropy of the 
system, i.e., the logarithm of the number of solutions a satisfiable instance has. 
It was shown by an approach from statistical physics that the entropy is still 
finite at ac, i.e., the number of solutions is still exponential Monasson 
and Zecchina state that " hence (...) the transition itself is due to the abrupt 
appearance of logical contradictions in all solutions and not to the progressive 
decreasing of the number of these solutions down to zero." Such a sudden emer- 
gence of logical contradictions on a macroscopic level would be a good sign of a 
genuine phase transition. 

In this paper we give numerical evidence that the explanation for the finite 
entropy at ac is far simpler, namely that the average number of solutions of 
satisfiable instances is universally described by a lognormal distribution over a 
range of different system sizes and 4.0 < a < 4.5. This means that, although 
many of the instances are already unsatisfied at ac, some of the satisfiable in- 
stances have a large number of solutions left, which accounts for the high average 
number of solutions. A lognormal distribution can be the result of the iterative 
application of a factor drawn from some distribution. This raises the question 
of whether the phase transition of P [UNS AT] may be only a sharp threshold 
phenomenon that is not based on the non-trivial restructuring of interacting 
entities. In the following we will first discuss our numerical findings regarding 
the average number of solutions, then give an alternative explanation for the 
rise of the hardness at ac and finally discuss some simple models with different 
kinds of sharp threshold phenomena. The last model shows qualitatively the 
same behavior as P [UNS AT]. 

In summary, we do not attack the idea that /c-SAT shows phase transitions 
in general but we put on display some simple explanations and models that 
raise doubt about whether the proposed phase transition of P [UNS AT] is more 
than a simple sharp threshold phenomenon. In general, the once obvious bor- 
der between first order and continuous phase transitions and their respective 
properties has become so blurred that scientists from neighboring disciplines, 
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e.g., computer scientists or chemists and even statistical physicists not special- 
ized in spin-glasses, have difficulties to find out what are the properties that 
define a phase transition. Our main contribution in this paper are thus the 
above mentioned toy models that are so simple that they cannot be considered 
to have a genuine phase transition. Still, they mimic some important proper- 
ties of the 3-SAT system. With this we would like to open a discussion with 
the spin-glass community to understand what differentiates the simple models 
from 3-SAT and what exactly makes a phase transition. The paper thus aims 
at starting a discussion of the difference between a mere sharp threshold phe- 
nomenon and a genuine phase transition. We hope that a discussion of what 
properties are required for acknowledging a phase transition will help to support 
the interdisciplinary discussion in this area. 

The paper is organized as follows: After giving some definitions in Sec. [2l we 
will discuss in Sec. [3] the question of whether the sharp threshold phenomenon 
of P[UNSAT] is directly caused by a continuous phase transition of an order 
parameter related to P[UNSAT]. We will furthermore discuss whether there is 
any evidence at all for the existence of two different phases. Sec. [3] finally 
introduces two simple statistical models that show similar phase transition-like 
behavior without any underlying interacting elements. The first one is clearly 
trivial, while the second shows a non-trivial finite size scaling effect. From 
these models, we develop a simple toy model that qualitatively shows the same 
properties as P[UNSAT] in random 3-SAT. Finally, we discuss our findings in 
Sec. El 

2. Definitions 

Let V he a set of n variables {wi, . . . , -y„}. Each variable has two liter- 
als, a positive literal denoted by Vi and a negated literal denoted by —Vi. 
A Boolean formula in conjunctive normal form ( CNF) consists of m subsets 
of and-connected literals, called clauses or constraints. The clauses are or- 
connected. An assignment is a function a : V {true, false}" that assigns 
each variable a Boolean value, i.e., true or false. With a given Boolean formula 
in CNF and a given assignment, the formula can be evaluated: a positive literal 
which is assigned true evaluates to true, and to false if it is assigned false. A 
negated literal which is assigned false evaluates to true and to false otherwise. 
A clause evaluates to true if at least one of its literals evaluates to true, and 
the whole formula evaluates to true if all clauses evaluate to true. The satisfia- 
bility problem, or SAT problem for short, asks whether a given Boolean formula 
has at least one assignment such that it evaluates to true. Such an instance 
is called satisfiable (sat), and one where no satisfying assignment can be found 
is called unsatisfiable (unsat). If all clauses contain k literals, we speak of k- 
SAT. If, moreover, the instance is created by choosing the k literals uniformly 
at random without repetition, we speak of random fc-SAT. a denotes the ratio 
between the number of clauses m and the number of variables n in a random 
/c-SAT instance. 
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For each two assignments a and a', the Hamming distance d{a, a') is defined 
as the number of diflFerent assignments to the variables. 

The SAT problem can be solved by diff^erent algorithms, the most widely 
used being based on the following scheme, first proposed by Davis et al. [6|. 
It is a kind of trial-and-error procedure in which a growing subset of variables 
is assigned Boolean values until we either find a solution or encounter a con- 
tradiction. In each step, take one of the variables that is yet unassigned and 
assign either true or false to it. Say, variable Vi is assigned true. Now, the 
instance can be simplified by (temporarily) removing all clauses which contain 
the positive literal of Vi since they are already satisfied. Furthermore, we can 
temporarily remove the negated literal from all clauses since it cannot contribute 
to the satisfaction of the clauses it is contained in. If after this step all clauses 
have been removed, we have found a solution to the problem. If we encounter 
an empty clause, all of its originally contained variables have been assigned the 
wrong value and thus we have found a contradiction. In this case, we have to 
backtrack and restore the instance up to the point where Vi was unassigned. 
Then, the same procedure is tried, but assigning false to Vi. As long as there 
is no solution and no contradiction in the simplified instance, we simply pro- 
ceed with the partial assignment. If all decisions lead to contradictions, the 
instance is unsatisfiable. There are many improvements to this basic scheme, 
e.g. , specifying an order in which the variables are assigned [ist and learning 



14l |. One basic improvement is unit propagation^ i.e., whenever a clause has 
only one literal left, it can only be satisfied when the variable's assignment is 
set accordingly. Note that the assignment of such a variable is called a depen- 
dent decision while the assignment of Boolean values to all other so-called free 
variables is called independent decision. 

3. Random 3-SAT 

It is well known that 3-SAT belongs to the set of the so-called A^P-hard 
problems, i.e., problems for which so far no algorithm with polynomial runtime 
has been found Q. In the worst case, finding a solution to these problems 
can take exponential time such that even relatively small instances cannot be 
solved within months. On the other hand, many real-world SAT problems can 
be solved in a short time despite their huge size. Since this behavior is not well 
understood, research has been dedicated to understanding why and how hard 
instances emerge and what their structure looks like. 



It was observed early [17|, |J| that plotting P[UNSAT] against a shows a 
sudden jump at some value cuc independent of the system size n. Furthermore, 
around this value it also takes various algorithms the longest time to solve 
random 3-SAT problems. This divergence of the hardness and the sudden jump 
of PJXJNSAT] at some universal a resembles a phase transition-like behavior 
@)[2l|. In their classic paper from 1994, Kirkpatrick and Selman used the well- 
understood model of percolation in growing random graphs and the techniques 
deployed in this area for the identification of critical phenomena in random 3- 
SAT: "We use finite-size scaling, a method from statistical physics in which the 
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observation of how the width of a transition narrows with increasing sample size 
gives direct evidence for critical behavior at a phase transition." They scaled 
the curves for different k according to n'^ * {a ~ a^jac and evaluated olc to 
be 4.15 ± 0.05 and the critical exponent j/ for /c = 3 to be 2/3 [l^. Today a 
value of ttc = 4.267 is often cited for the P[UNSAT] threshold but plotting 
P[UNSAT] against the rescaled parameter y — n°-^^(a— 4.12)/4.12 yields a much 
better scaling than that for the rescaled parameter y = n°-^^(a — 4.267)/4.267 
(s. Figured]). The reason for this mismatch is not totally clear. It could be due 
to the still quite small system size in our experiments. 

In this paper we suggest that the observed threshold phenomenon of P [UNSAT] 
is not so much a sign of criticality but simply caused by the law of large numbers. 
In general it is not easy to prove that an observed sharp threshold behavior is 
not caused by the critical behavior associated with a phase transition since there 
are many possible interactions that could be causing it. In the next section we 
will first analyze the typical number of solutions, which is closely related to the 
entropy of the system. 

3.1. Number of solutions 

A first-order phase transition is deeply connected to a sudden increase in 
order. For example, when water freezes the molecules are fitted into a neat 
structure that shows high order. It is difficult to see intuitively what kind of 
order is measured by P [UNSAT]. However, when a continuous phase transition is 
studied using an existence parameter instead of a quantitative parameter, it may 
seem to be rather like a first order transition, as we will exemplify in the case 
of site percolation in 2D. Here, one can ask about the behavior of two different 
but related parameters: "Is there a biggest connected component (BCC) of size 
0(n)?" (this is the existence parameter) or "What is the size of the BCC ?" (and 
this is the quantitative parameter). Plotting the relative size of the BCC shows 
a continuous phase transition at some critical value, i.e., the first parameter 
is a quantitative one that reveals the complex behavior of the system. At the 
critical value, a finite fraction of all vertices is spanned by the BCC, i.e., it has 
size 0{n). Since the second parameter just asks for the existence of a BCC with 
size 0{n), it will trivially show a first-order phase transition-like behavior at the 
same value [i^. Thus, in this system, the seemingly first-order phase transition- 
like behavior of the existence parameter is just a trivial implication from the 
true continuous phase transition concerning the quantitative parameter. Since 
P [UNSAT] asks whether there exists a solution or not, we first analyzed whether 
the seemingly first-order phase transition of P [UNSAT] also belongs to this type, 
i.e., whether it is an indicator of a more complex continuous phase transition of 
a related quantity like the behavior of the number of solutions. 

An instance is unsat if and only if it has no solution — this is a typical ex- 
istence parameter. A possible quantitative parameter of which this existence 
parameter could be an indicator is the average number of solutions. The loga- 
rithm of this quantity is the entropy of the system at a given a • Figure [2] 
shows that the average number of solutions < s > can be fitted to a simple 
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exponential law, i.e., 

<.>^2-(I)'". (1) 

This simple behavior of the average number of solutions coincides with the so- 
called annealed estimate of the number of solutions , which is based on the 
fact that any solution will be 'killed' with a probability of 1/8 by a clause drawn 
uniformly at random. But although this estimate has been used for a long time, 
it is surprising that the average number of solutions follows it so closely since it 
does not take into account that in reality the solutions' probability to be deleted 
are dependent: i.e., two very similar solutions have a higher probability to be 
killed by the same constraint whereas two solutions that assign the opposite 
values to variables can never be killed by the same constraint. Thus, it is still 
surprising that the average number of solutions universally follows this simple 
law for all system sizes. Furthermore, Figure [5] reveals that at ttc there is — 
on average — still an exponential number of solutions although we know that 
the probability of finding a satisfiable instance drops to zero for large system 
sizes. This has also been proven rigorously by . It is clear that without the 
gap between the critical value of Oif. and the point a = 5.19 where the average 
number of solutions becomes 1, there would not have been much interest in the 
seemingly critical behavior of P[UNSAT]. 

The only possibility to achieve an exponential average number of solutions at 
ac and P [UNSAT] for n oo is to have a strongly right-skewed distribution 
of the number of solutions an instance has. Indeed, as Figure |3] shows, the 
distributions of satisfiable instances displays a universal behavior. Over an 
interval of a = 4.0 — 4.5 and different system sizes rt = 30 — 100, the cumulative 
distribution of the number of satisfiable instances, P(s), can be fitted by the 
cumulative distribution of a lognormal distribution given as 



Pi-^) - \ + iorf 



ln(s) — /i 



r\/2 



(2) 



where /x and a correspond to the mean and the standard deviation of ln(s), and 
erf denotes the error function. 

The lognormal distribution of s explains that there is no need of a sudden 
drop of < s > at since the average is dominated by some instances with 
a high number of solutions, although most instances are already unsatisfiable. 
In summary, neither the typical number of solutions nor its distribution shows 
critical behavior around ac- Since we know now that the distribution of s 
is highly skewed, another intuitive measure is the quenched average^ i.e., the 
average < log(s -I- 1) > of the logarithm of the number of solutions, shown in 
Figure |H Note that also this does not show any interesting behavior around 
ac = 4.15. 

In summary, it does not seem to be the case that the sharp-threshold phe- 
nomenon of P [UNSAT] is the simple indicator of a related, continuous phase 
transition of a quantitative measure. 
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3.2. Are there two different phases in k-SAT? 

This leads us back to the question of whether we reahy have two phases in 
this system, one consisting of satisfiable instances and one consisting of unsat- 
isfiable instances. In fc-SAT, the main problem is that we cannot observe two 
different phases by eye. In this special case, the sharp threshold behavior had 
been observed first, and this lead to the definition of the "phases" instead of 
observing and defining the phases first before analyzing the transition between 
them. This happened because the sharp threshold phenomenon divided the in- 
stances into two different groups that match our intuition. Maybe, however, an 
unsatisfiable instance is just an instance with solutions and not substantially 
different from an instance with exactly 1 solution. The question is thus whether 
the two 'phases' are just a differentiation that is convenient for computer scien- 
tists or whether they relate to a small structural change in some interaction on 
a microscopic scale that leads to a huge change in macroscopic behavior. 

Hardness has been used to argue that there are two different phases, since 
it shows a diverging behavior around a^. Of course, hardness, measured as the 
number of independent decisions of a DPL-like algorithm [6] or simply by the 
runtime, depends on the specific implementation. Nonetheless, the basic picture 
is always the same, namely that it peaks around aj^. The question is whether 
this maximum is genuine or directly dependent on the definition of a satisfiable 
and an unsatisfiable instance. We will give evidence here that the occurrence of 
a maximal runtime around ac is directly implied by the definition of a decision 
algorithm. The problem is that a decision algorithm does different things in the 
two cases: if it runs on a satisfiable instance, it stops after the first solution is 
encountered. Otherwise, a proof has to be given that no solution exists. For 
DPL-like algorithms this means that in the first case only some fraction of 
the whole decision tree has to be searched while for unsatisfiable instances the 
whole tree has to be traversed. We can assume two things: 

1. the decision trees of typical satisfiable and typical unsatisfiable instances 
at a given a are of approximately the same size; 

2. the locations of the solutions in the leaves of the tree are uniform. 

Thus, let the size of a typical decision tree at a given a be denoted by t{a). 
Even if an instance has just one solution, we will on average traverse only half 
of the tree to find it. For an unsatisfiable instance at the same a, we will on 
average take double the time to find the solution. Since at ac there are more 
unsatisfiable than satisfiable instances, this is already an explanation for the 
increasing runtime at ac. Of course, the behavior of the average hardness is a 
bit more complicated than this. The average hardness h(a) can be dissected into 
hsat{oi) and hunsat{oi), the hardness of satisfiable and unsatisfiable instances at 
a. With this, 

hia) = (1 - P[UNSAT])/i,at(a) + P [UNSAT] /i™,,* (a). (3) 



^Note that the maximum itself is difficult to locate and might also shift with n. 
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Note that the hardness hunsat{ct) is simply given by the average size tunsat{ct) 
of the decision tree of unsatisfiable instances at a. While hunsat{oi) = tunsatioi) 
seems to be a simple, exponentially decreasing function with a (s. Figure [S^), 
hsat{oi) is at a maximum around ttc (s. Figure [S]d). /isat(tt) can be approxi- 
mated as the product of igat (a), the size of the average decision tree of satisfiable 
instances at a, and 0t(ck), the average fraction of the decision tree that is tra- 
versed before a solution is found. While the first is decreasing with a, the latter 
is increasing with a. Thus, the maximum around Uc in hsat{oi) is introduced 
artificially by stopping after the first solution is encountered. If we instead look 
at the runtime of an algorithm that counts the number of all solutions an in- 
stance has, we see no singularity of the hardness around Uc as Figure [SJ shows. 
We thus conclude that the hardness supports the view that there are no two 
phases since the size of the decision tree decreases smoothly with growing a, at 
least for the system sizes that could be computed. 

Summarizing the results so far, we could not find a measure which is related 
to the existence question measured by P[UNSAT] and which shows a continu- 
ous phase transition. We also did not find any measure that is independent of 
P[UNSAT] and therefore proves that indeed an unsatisfiable instance is struc- 
turally different from an instance with 1 solution. Instead, we will now present 
results from two very simple statistical systems that show a sharp threshold 
phenomenon. We will then use these systems to develop a simple toy model 
that shows qualitatively the same behavior as 3-SAT and shows quite clearly 
that no phase transition is needed to produce a 3-SAT-like system. 



4. Sharp threshold phenomena in simple statistical systems 

In this section we discuss two simple stochastic processes. The first one, is 
a simple coin tossing example that is discussed in Sec. 14.11 and the second is a 
statistical problem, called the coupon collector's problem, discussed in Sec. 14.21 



^.1. Throwing a Biased Coin 

In the book Computational complexity and statistical physics, the editors 
briefly discuss the question of whether sharp thresholds are more than just an 
effect of the law of large numbers. They contrast SAT with the following simple 



system [21|, p. 8]: a biased coin is tossed that shows heads with probability j3 and 
tails with probability 1 — /3. Let an instance consist of n tosses and let n define 
the system size. We expect the chance Py^heads > ^tails] to see more heads 
than tails in one of these instances to change from for /3 < 0.5 to 1 for f3 > 0.5 
with an ever-increasing sharpness with growing h. With this example, Percus 
et al. indicate that sharp threshold phenomena per se are not so surprising, but 
they don't settle the question of whether this simple system will already show 
finite size scaling. The question is thus whether the curve P[^heads > ^tails] 
for low n just fluctuates stronger or is indeed less steep than that of a larger 
system. This question is settled by Figure [6l 

Figure |6]i shows the fraction of 10,000 instances of h tosses each where 
more heads than tails were shown. The curves meet approximately at /3 = 0.5. 
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Plotting them against the rescaled parameter y = n°-^(/3 — 0.5)/0.5 shows a 
perfect universal scaling. This model is especially interesting since here also 
the sharp threshold behavior results from asking a peculiar kind of question. 
Instead of looking at the more natural question of P[heads\ which is of course 
identical to /3, the behavior artificially becomes a sharp threshold behavior by 
asking when it is more likely to see more heads than tails in any given system 
size. Moreover, this most simple system also displays a finite size scaling efi^ect. 
Naturally, the corresponding exponent /? = 0.5 is the one dictated by the law of 
large numbers. Thus, although a finite size scaling effect can be seen, nobody 
would regard it as the effect of a phase transition since the exponent is a trivial 
one. The next example is much more interesting since it shows a non-trivial 
exponent. 

4-2. The Coupon Collector's Problem 

The simple system of coin tossing cannot easily be likened to 3-SAT. We will 
thus introduce a second statistical problem called the coupon collector's problem: 
let there be a set of n' distinguishable objects called coupons, identified by a 
coupon ID from 1 to n'. Each coupon is contained multiple times in a large 
multi-set and collectors can purchase coupons from this multi-set by drawing 
one item uniformly at random. We will assume that each coupon ID has the 
same probability of being drawn. The coupon collector problem asks how many 
draws have to be made expectedly until each coupon ID is drawn at least once, 
i.e., the question of when the collection is completed. In essence, once the 
collector has collected k different IDs, the chance of picking a new ID is "^7^ 
and thus the expected time to find a new one is ■ Summing over these 
expected times gives ^ + -\ . . .^n' = n' \ ^ . . . ^ ^) = n'7J„/ . This 
can be approximated to be n! Inn' -l-Tn' -|- ^ -|-0(1), where T ~ 0.57722 denotes 
the Euler-Mascheroni constant. The variance is bound from above by 2n'^. 

For a set of x collections, we now define P[full,t] to be the fraction of 
full collections after t draws. Of course, the number of draws depends on the 
system's size. We thus define 7 :~ ^, i^„^q ^T^n'+o 5 ^^"^ P^"-"^ P[f'ull,'y] against 
7. Figure [Tli shows the result for different system sizes from 10 to 1000 in 
dependence of 7. Interestingly, this looks like a phase transition at a critical 
7c = 1. Furthermore, we define a rescaled parameter z = n'°'^^ (7 — 7c) against 
which we plot the functions, as shown in Figure [7]d. 

Note that the critical exponent is far away from the trivially expected 0.5. 
We can now define two phases: full collections and incomplete collections. With 
this. Figure [7] shows clearly that there exists a first-order phase transition be- 
tween the two phases. Or does it? But of course, a system as simple as the 
coupon collector's problem does not meet the intuition about a system with a 
phase transition and it especially cannot exhibit any non-trivial collective be- 
havior. Just defining that one condition of a system, i.e., whether a collection 
is complete or not, represents two phases does not make them different phases. 
Also, the finite size scaling effect cannot justify the notion of a phase transition 
since it seems to be mainly an effect of the law of large numbers. 
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In the following we will highlight the connection between the coupon collec- 
tor's problem and the behavior of P[UNSAT] in 3-SAT. 

4-2.1. Connection between random k-SAT and the coupon collector's problem 

When a = each random fc-SAT instance has exactly 2" solutions. Every 
added clause C = {Zi, ^2, • ■ • , ^fc} excludes all solutions in which all negated 
literals U are assigned true and all positive literals Ij are assigned false. That 
is, each added clause extinguishes a fraction of of all remaining solutions. 
Of course, some of the solutions might already have been extinguished by a 
clause added earlier. An instance becomes unsatisfiable when all of its possible 
assignments have been extinguished by some clause. Thus, the question is very 
similar to that of the coupon collector's problem: in each time step we draw 
uniformly at random k literals that extinguish a 2~'^th of all possible assignments 
and we want to know when all possible assignments are extinguished. Of course, 
there are two main differences: we draw more than one 'coupon' at once, namely 
2""^^, and moreover these are not independent of each other. The first condition 
alone would just reduce the expected completion time by some factor, but the 
effect of the second condition is harder to estimate. 

Note that there is really no kind of interaction between the clauses. Given a 
set of solutions S that are left for some instance /, adding a clause will lead to 
the following reduced set of solutions 5": lei s € S be any solution that does not 
satisfy the newly added clause. This cannot be a solution of the new instance, 
and thus it is removed from S. Let now s € 5* be some solution that satisfies the 
newly added clause. Since it was contained in S, this means that the assignment 
given by s satisfies at least one literal in all the clauses added so far plus at least 
one in the newly added clause. Thus, this solution is in S". The clauses are 
independent of each other in the sense that the only solutions extinguished by 
a clause are those that don't satisfy it. There is no cumulative effect of the 
clauses such that after adding some of them a whole avalanche of solutions is 
extinguished. Note, however, that the solutions in S are not independent of 
each other since if s G 5, other solutions s' with a low Hamming distance to s 
have a higher probability of being in S than those with a large distance. 

4.3. A toy model for i-SAT 

Neither the coupon collector's problem nor coin tossing displays one of the 
main qualitative behaviors of 3-SAT. The main point of interest is the gap 
between a — 5.19 at which the average number of solutions meets 1 and the 
point ttc at which most instances are already unsatisfiable. In the following, we 
introduce a toy model that shows this more involved behavior but is still quite 
simple and not likely to have a real phase transition. The toy model is based 
on the following idea: an instance is represented by a number, starting with 2". 
This represents the number of solutions left at a given a. Adding a clause is 
mimicked by multiplying this number by some reduction factor. 

Of course, simply multiplying the number by 7/8 is already enough to pro- 
duce the average number of solutions shown in Figure [21 and also a sharp 
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threshold behavior of P[UNSAT]. But, unfortunately, the latter takes place at 
a = 5.19. Looking at the real reduction factor, it turns out that the distribution 
broadens with a and is shifted to the right. We used this observation for the 
toy model of random 3-SAT, in which we draw a multiplicative factor from a 
normal distribution with a standard deviation a ~ 0.0585 * a and an average of 
n{a) given by 

, , r 0.875 + 0.009 * a a < 3.8 , , 

'^^-"^ ~ \0.875 + 0.170*a a > 3.8 ' ^ ' 

If the drawn number is lower than or higher than 1, we set it to or 1, 
respectively. This factor is then multiplied with the current number of the 
toy model instance. An instance of the toy model represents an unsatisfiable 
instance if its number drops below 1. Thus, PiQy[UNSAT,a] gives the fraction 
of toy model instances at a whose number is below 1 . 

In Figures [5] and [5] we show our simulation results for the toy model defined 
above. According to Figure IHti, the average number of solutions follows the 
same exponential behavior as expected from ([ij, and (s) drops below 1 at a = 
5.19. Surprisingly, a sharp threshold behavior can be observed when plotting 
P[UNSAT] as a function of a as shown in Figure Oj-c. Similar to 3-SAT, the 
transition point of the threshold behavior at a = 4.76 is separated from the 
point where (s) = 1 by a non-negligible gap. Furthermore, the distribution of 
the numbers -Ptoyl'^j ^i;] best described by a lognormal distribution and shows 
the same universal scaling behavior as the real P[s, a] distribution, as displayed 
in Figure [9l 

In summary, this toy model shows the same qualitative properties as the real 
3-SAT system. 



5. Summary 

In this article we have raised the question of whether or not the sharp thresh- 
old phenomenon displayed by P[UNSAT] around a = 4.2 is a mere statistical 
event that does not relate to a phase transition in the classical sense. Our intu- 
ition is that there is no interaction of the elements of a Boolean instance, i.e., 
clauses, variables, or solutions, that leads to this phenomenon. We also see no 
principal difference between instances with at least 1 solution and those with 
no solution. We thus believe that the sharp threshold behavior of P[UNSAT] 
can rather be likened to the sharp threshold phenomena in simpler systems, like 
the coupon collector's problem. Of course, it is obvious that approaches from 
statistical physics were successful in describing 3-SAT and that some of these 
results lead to the most powerful SAT-solver based on survey propagation p^ . 
It is important that we not question the phase transition shown for other order 
parameters like backbone size [2^, clustering of the solution space (ll|, o r the 
order parameter associated with the messages in survey propagation 16l|, but 
only P [UNS AT] as an order parameter of a real phase transition. 

We conclude by describing one of the possibly many examples where asking 
somewhat different questions about the states of the same system may easily 
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lead to the conelusion that more than one observable transition (and of different 
kinds) takes place in the system, even though it is widely accepted that there 
is only a single relevant transition in it. 

Consider the Ising model on a face-centered cubic lattice. As the system 
cools down from high temperatures, we ask two simple questions (without loss 
of generality we can assume that for low temperatures the up spins take over): 

1. What is the total spontaneous magnetization of the system? (ratio of up 
spins minus the ratio of down spins) 

2. Is there a percolating cluster of down spins present? 

The (textbook level) answers are: 

1. Below a critical temperature T/, the spontaneous magnetization sharply 
increases as the number of up spins starts to grow qiiickly. The associ- 
ated transition is a prototype of continuous phase transitions (involving 
fluctuations, etc). 

2. At a temperature < Tj, the probability that a percolating (connected 
infinite) cluster of down spins is present suddenly drops from 1 to (as if 
a first-order transition was taking place). 

We suggest that the lesson from this analogy is the following: the answer 
one gets depends very much on the question. Our conclusion is that it remains 
to be demonstrated that asking "What is the probability of having a satisfiable 
instance in 3-SAT?" is the right question. We argue that this particular question 
(order parameter) is not closely related to the variety of possible rich transitions 
taking place in this paradigmatic satisfiability problem. 

Has the question of whether or not P[UNSAT] actually undergoes a phase 
transition, more to it than just being a simple question of how to name some- 
thing? In this interdisciplinary field it is very important to be careful with 
terms; a phase transition is more than jiist a sharp threshold phenomenon and 
requires proof that the supposed phases behave differently in some aspect that 
is independent of their definition. The simple stochastical systems presented 
here stress the point that a sharp threshold phenomenon, even if accompanied 
by a non-trivial finite size scaling effect, is not enough to show a genuine phase 
transition - an independent proof of two different phases is needed in addition. 
We hope that this article will trigger a discussion about the observations to 
be made in categorizing a sharp threshold phenomenon as a non-trivial phase 
transition, and thereby support ongoing interdisciplinary research in this field. 
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Figure 1; a) Scaling of P[UNSAT] against y = ArO-66(Q _ 4.12)/4.12. b) Scaling of P[UNSAT] 
against y = Af°-66(a - 4.267)/4.267. 
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Figure 3: The rescaled cumulative distribution of the number of solutions of satisfiable in- 
stances over a large range of a = 4.0 — 4.5 and system sizes n = 30 — 100. The and cr denote 
the two fitting parameters of the lognormal distribution used for the rescaling. 




Figure 4: The quenched average of the number of solutions. 
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Figure 5: a) Average hardness of proving that an instance is unsatisfiable, measured as the size 
of the decision tree in a DPL-Uke algorithm, b) Average hardness of finding the first solution 
of satisfiable instances, c) Average hardness of finding all solutions of satisfiable instances, d) 
Average fraction of the decision tree that is traversed until the first solution is found. 




Figure 6; a) Probability P[#heads > #tails] that more heads than tails arc tossed in n tosses 
with a biased coin that shows heads with probability /3 and tails with probability 1 — /3. b) 
P[#heads > #tails] in dependency of the rescaled parameter y. 
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Figure 7: a) Percentage of full collections in dependency of 7 = t/{n log n + 0.577n + 0.5). b) 
Percentage of full collections in dependency of the rescaled parameter y. 




Figure 8: a) The average number of solutions in the toy model. As a reference, the expected 
curve for real 3-SAT instances as described by equation[T]is also given, b) PiQy[UNSAT, a\ for 
the toy model, c) P^fy^lU N S AT , a] against the rescaled parameter y = AfO ^^ *(a-4.76)/4.76, 
i.e., the critical ac{toy) is 4.76. 
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Figure 9: Universal scaling of the cumulative distribution of the number of solutions in the 
toy model. The /* and <t denote the two fitting parameters of the lognormal distribution used 
for the rescaling. 
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